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We study the problem of deciding satisfiability of first order logic queries over views, our aim being 
to delimit the boundary between the decidable and the undecidable fragments of this language. 
Views currently occupy a central place in database research, due to their role in applications 
such as information integration and data warehousing. Our main result is the identification of a 
decidable class of first order queries over unary conjunctive views that generalises the decidability 
of the classical class of first order sentences over unary relations, known as the Lowenheim class. 
We then demonstrate how various extensions of this class lead to undecidability and also provide 
some expressivity results. Besides its theoretical interest, our new decidable class is potentially 
interesting for use in applications such as deciding implication of complex dependencies, analysis 
of a restricted class of active database rules, and ontology reasoning. 
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1. INTRODUCTION 

The study of views in relational databases has attracted much attention over the 
years. Views are an indispensable component for activities such as data integration 
and data warehousing [Widom 1995; Garcia-Molina et al. 1995; Levy et al. 1996], 
where they can be used as "mediators" for source information that is not directly 
accessible to users. This is especially helpful in modelling the integration of data 
from diverse sources, such as legacy systems and/or the world wide web. 

Much of the research related to views has addressed fundamental problems such 
as containment and rewriting/optimisation of queries using views (e.g. see [Ullman 
1997; Halevy 2001]). In this paper, we examine the use of views in a somewhat dif- 
ferent context, where they are used as the basic unit for writing logical expressions. 
We provide results on the related decision problem in this paper, for a range of pos- 
sible view definitions. In particular, for the case where views are monadic/unary 
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conjunctive queries, we show that the corresponding query logic is dccidable. This 
corresponds to an interesting new fragment of first order logic. On the application 
side, this decidable query language also has some interesting potential applications 
for areas such as implication of complex dependencies, ontology reasoning and ter- 
mination results for active rules. 

1.1 Informal Statement of the Problem 

Consider a relational vocabulary Ri , . . . , R p and a set of views V\,...,V n . Each 
view definition corresponds to a first order formula over the vocabulary. Some 
example views (using horn clause style notation) are 

Vi(xi,2/i) <— Ri{xi,yi),R 2 {yi,yi,zi), R 3 (zi,z 2 , xi), Ri{z 2l xi) 
V 2 (zi) <- R 1 (z l7 z 1 ) 

Each such view can be expanded into to a first order sentence, e.g. V\(xi, y\) 44> 
3zi,z 2 (Ri(xi,yi) A R 2 {yi, yi, Zi), Rs(zi, z 2 , xi) A -iR 4 (z 2 ,Xi)). A first order view 
query is a first order formula expressed solely in terms of the given views, e.g. 
qi = 3x 1 ,y 1 {{V 1 (x 1 ,y 1 )VV 1 (yiiX 1 ))f\-<V 2 {xi))AVzi{V 2 {zi) Vi(zi,zi)) is an 
example first order view query, but q 2 = 3xx,yx{V\{xx,y{) V R{y\,xi)) is not. 
By expanding the view definitions, every first order view query can clearly be re- 
written to eliminate the views. Hence, first order view queries can be thought of 
as a fragment of first order logic, with the exact nature of the fragment varying 
according to how expressive the views are permitted to be. 

From a database perspective, first order view queries are particularly suited to 
applications where the source data is unavailable, but summary data (in the form 
of views) is. Since many database and reasoning languages are based on first order 
logic (or extensions thereof), this makes it a useful choice for manipulating the 
views. 

Our purpose in this paper is to determine, for what types of view definitions, 
satisfiability (over both finite and infinite models) is decidable for the language. If 
views can be binary, then this language is clearly as powerful as first order logic 
over binary base relations, and hence undecidable (see [Boerger et al. 1996]). The 
situation becomes far more interesting, when we restrict the form that views may 
take — in particular, when their arity must be unary. Such a restriction has the 
effect of constraining which parts of the underlying database can be "seen" by the 
view formula and also constrains how such parts may be connected. 

1.2 Contributions 

The main contribution of this paper is the definition of a language called the first 
order unary conjunctive view language (UCV) and a proof of its decidability. As 
its name suggests, it uses unary arity views defined by conjunctive queries 2 . We 
demonstrate that it is a maximal decidable class, in the sense that increasing the 
expressiveness of the view definitions results in undccidability. Some interesting 
aspects of this decidability result are: 



2 More generally, views may be any existential formulas with one free variable, since this can be 
rewritten into a disjunction of conjunctive formulas with one free variable. 
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— It is well known that first order logic solely over monadic relations is decidable 
[Lowenheim 1915], but the extension to dyadic relations is undecidable [Borger 
et al. 1997]. The first order unary conjunctive view language can be seen as 
an interesting intermediate case between the two, since although only monadic 
predicates (views) appear in the query, they are intimately related to database 
relations of higher arity. 

— The language is able to express some interesting properties, which might be 
applied to various kinds of reasoning over ontologies. It can also be thought of 
as a powerful generalisation of unary inclusion dependencies [Cosmadakis et al. 
1990]. Furthermore, it has an interesting characterisation as a decidable class of 
rules (triggers) for active databases. 

To briefly give a feel for this decidable language, we next provide some example 
unary conjunctive views and a first order unary conjunctive view query defined over 
them: 

Vi(x) <- R 1 (x,y),R 2 (y,z),R 3 (z,x'),R 4 (x',x) 

V 2 (x) <- Ri(x,y),R 1 (x,z),Ri(y,z) 

V 3 (x) <- Ri(x,y),Ri(x,z),Ri(y,y),Ri{z,x) 

V 4 (x) <- Ri(x, y), R 3 (y, z), R 4 {z, x'), R 4 (x', y'),R 3 (y', x) 

3x(V 2 (x) A -.Vi(a;)) A ^3y(V 3 (y) A ^V 4 (y)) 

1.3 Paper Outline 

The paper is structured as follows: Section 2 defines the necessary preliminaries 
and background concepts. Section 3 presents the definition of the logic UCV. 
Section 4 is the core section of the paper, where the decidability result for the class 
UCV is proved. Section 5 shows that extensions to the language, such as allowing 
negation, inequality or recursion in views, result in undecidability. Section 6 covers 
applications of the decidability results and then Section 7 provides some results 
on expressivity. Section 8 discusses related work and section 9 summarises and 
discusses future work. 

2. PRELIMINARIES 

In this section, we state basic definitions and relevant results. The reader is assumed 
to be familiar with standard results and notations from mathematical logic (e.g. 
see [Enderton 2001]). In the following, formulas are always first-order. The symbol 
TO denotes the set of first order formulas over any vocabulary a. In addition, if 
C C TO (i.e. £ is a fragment of TO), we denote by C{a) the set of formulas in C 
over the vocabulary a. 

2.1 First-order logic 

A (relational) vocabulary a is a tuple (R\, . . . ,R n ) of relation symbols with each 
Ri associated with a specified arity rj. A (relational) a- structure A is the tuple 

(A; R-^, . . . , r£) 

where A is a non-empty set, called the universe (of A), and J?A is an ri-ary relation 
over A interpreting Ri. We refer to the elements in the set A as the elements in 
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A, or simply by constants 3 (of A). In the sequel, we write Ri instead of when 
the meaning is clear from the context. We also use STRUCT(a) to denote the 
set of all cr-structures. We assume a countably infinite set VAR of variables. An 
instantiation (or valuation) of a structure I is a function v : VAR — > /. Extend 
this function to free tuples (i.e. tuple of variables) in the obvious way. We use the 
usual Tarskian notion of satisfaction to define I \= cp[v], i.e., whether <f> is true in 
I under v. If is a sentence, we simply write I |= <j>. The image of a structure I 
under a formula <f){xi , . . . , x n ) is 

(f>(I) = {v(xi, . . . , x n ) : v is an instantiation of I, and I |= <f>[v]}. 

In particular, if n = 0, we have that 0(1) ^ iff I |= <fi. We say that two cr-structures 
A and B agree on £ iff for all <f> e £(c) we have A |= <f> B |= 0. 

Following the convention in database theory, the (tuple) database V(A) corre- 
sponding to the structure A (defined above) is the set 

{Ri(t) : 1 < i < n and t G R^}. 

It is easy to see that such a database can be considered a structure with universe 

adom(A), which is defined to be the set of all elements of A occurring in at least one 

relation Ri, and relations built appropriately from T>(A). Abusing terminologies, 

we refer to the elements of T>(A) as tuples (associated with A). In addition, when 

the meaning is clear from the context, we shall also abuse the term free tuple to 

mean an atomic formula R(u), where R € a and u is a tuple of variables. 

A formula (f> is said to be satisfiable if there exists a structure A (either of finite 

or infinite size) such that 0(A) ^ 0; such a structure is said to be a model for 

0. We say that is finitely satisfiable if there exists a finite structure I such that 

(f)(1) ^ 0. Without loss of generality, we shall focus only on sentences when we are 

dealing with the satisfiability problem. In fact, if has some free variables, taking 

its existential closure preserves satisfiability [Indeed we shall see that the languages 

we consider are closed under first-order quantification]. 

Given two cr-structures A, B, recall that A is a substructure of B (written A C B) 
A R 

if A C B and R A C R D for every relation symbol R in a. We say that A is an 
induced substructure of B (i.e. induced by A C B) if for every relation symbol 
R in cr, R A = R D n A r , where r is the arity of R. Now, a homomorphism from 
A to B is a function h : A — > B such that, for every relation symbol R in a and 
a = (ai, . . . , a r ) G R^-, it is the case that h(a) = (h(a\), . . . , h(a r )) 6 R^. An 
isomorphism is a bijective homomorphism whose inverse is a homomorphism. 

The quantifier rank qrank(c/>) of of a formula (f> is the maximum nesting depth of 
quantifiers in <f>. 

2.2 Views 

For our purpose, a view over a can be thought of as an arbitrary FO formula over 
a. We say that a view V is conjunctive if it can be written as a conjunctive query, 



3 Although it is common in mathematical logic to use the term "constants" to mean the interpre- 
tation of constant symbols in the structure, no confusion shall arise in this article, as we assume 
the absence of constant symbols in the vocabulary. Our results, nevertheless, easily extend to 
vocabularies with constant symbols. 
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i.e. of the form 

3xi, . . .,x n (R 1 (u 1 ) A ... A R k {u k )) 

where each Ri is a relation symbol, and each is a free tuple of appropriate arity. 
We adopt the horn clause style notation for writing conjunctive views. For example, 
if {yi, . . . ,y n } is the set of free variables in the above conjunctive query, then we 
can rewrite it as 

V(yi, . . .,y n ) <- . . .,Rk{u k ) 

where V(yi, . . . , y n ) is called the head of V, and the conjunction Ri(m), . . . , Rk(uk) 
the body of V. The length of the conjunctive view V is defined to be the sum of 
the arities of the relation symbols in the multiset {R\, ■ ■ ■ , Rk}- For example, the 
lengths of the two views V and V defined as 

V(x) <- E(x,y) 
V'(x) <- E(x,y),E(y,z) 

are, respectively, two and four. Additionally, if n = 1 (i.e. has a head of arity 1), 
the view is said to be unary. Unless stated otherwise, we shall say "view" to mean 
"unary- conjunctive view with neither equality nor negation in its body". 

2.3 Graphs 

We use standard definitions from graph theory (e.g. see [Diestel 2005]). A graph is 
a structure G = (G, E) where E is a binary relation. The girth of a graph is the 
length of its shortest cycle. For two vertices x,y <E G, we denote their distance by 
dQ(x,y) (or just d(x,y) when G is clear from the context). For two sets Si and 
52 of vertices in G, we define their distance to be 

cIq(Si,S2) := min{(fQ(a, 6) : a e Si and b G S^}- 

In a weighted graph G with weight wq : E — ► N, the weight wq(P) of a path P 
in G is just J2 e eE(p) u 'G( e )- ^ e snan write w instead of wq if the meaning is 
clear from the context. In the sequel, we shall frequently mention trees and forests. 
We always assume that any tree has a selected node, which we call a root of the 
tree. Given a tree T = (T,E), we can partition T according to the distance of the 
vertices from the root. 

The Gaifman graph (see [Gaifman 1982]) associated with a structure A is the 
weighted undirected multi-graph G(A) = (G, E) such that: 

(1) G = A. 

(2) The multi-set E is defined as follows: for each x,y € G, we put an i?(i)-labeled 
edge xy in E with weight r (the arity of R) iff x and y appear in a tuple R(t) 
in V(A). [Notice that the multiplicity of xy in E depends on the number of 
tuples in X?(A) that contain both x and y as their arguments.] 

Note also that the subgraph of G(A) induced by the set of all elements of A in 
a tuple t is the complete graph K ri and so an L-labelled edge is adjacent to an 
edge e G E iff all L-labelled edges are adjacent (i.e. connected) to the edge e. 
For any a, b E A, we define the distance dj^(a,b) between a and b to be their 
distance in G(A). Also, extend this distance function to tuples and sets of tuples 
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by interpreting them as sets of elements of A that appear in them. Any pair of 
tuples R(t) and R'{t') in X>(A) are said to be connected (in A) if in G(A) some 
(and hence all) i?(i)-labeled edge is adjacent to some (and hence all) i?'(i')-labeled 
edge. 

2.4 Unary formulas 

A unary formula is an arbitrary FO formula without equality such that each of its 
relation symbols has arity one. Let a be a vocabulary whose relation symbols are 
of arity one. We shall use UFO (er) to denote the set of all unary formulas without 
equality over a. Also, we define UFO = U (J UFO(cr). The following lemma will be 
useful for proving expressiveness results in Section 7. 

Lemma 2.1. For every unary sentence, there exists an equivalent one of quanti- 
fier rank 1. 

PROOF. By a straightforward manipulation. See the proof of lemma 21.12 in 
[Boolos ct al. 2002]. [Their proof actually gives more than the result they claim. 
In fact, their construction converts an arbitrary unary sentence into one with one 
unary variable and of quantifier rank 1.] □ 

2.5 Ehrenfeucht-Frai'sse Games 

We shall need a limited form of Ehrenfeucht-Frai'sse games; for a general account, 
the reader may consult [Libkin 2004] . The games are played by two players, Spoiler 
and Duplicator, on two c-structures A and B. The goal of Spoiler is to show that 
the structures are different, while Duplicator aims to show that they are the same. 
The game consists of a single round. Spoiler chooses a structure (say, A) and an 
element a in it, after which Duplicator has to respond by choosing an element 6 in 
the other structure B. Duplicator wins the game iff the substructure of A induced 
by {a} is isomorphic to the substructure of B induced by {&}. Duplicator has 
a winning strategy iff Duplicator has a winning move, regardless of how Spoiler 
behaves. 

Proposition 2.2 (Ehrenfeucht-Frai'sse Games). Duplicator has a winning 
strategy on A and B iff A and B agree on first- order formulas over a of quantifier 
rank 1. 

2.6 Other Notation 

Regarding other notation we shall use throughout the rest of the paper: we shall use 
a, b for constants, x, y, z for variables, u for free tuples, U, V for views, U, V for sets 
of views, a for vocabularies, Ri, i?2, ... for relation symbols, A, B, . . . for structures 
and A, B for their respective universes. If V is a database (a set of tuples), we use 
adomiV) to denote the set of constants in V. Finally, given a a £ adom(T>) and a 
"new" constant b ^ T>, we define T>[b/a] to be the database that is obtained from 
V by replacing every occurrence of a by b. The notation T>\b\/a\, . . . ,b n /a n ] is 
defined in the same way. 

3. DEFINITION OF FIRST ORDER UNARY-CONJUNCTIVE-VIEW LOGIC 

Let a be an arbitrary vocabulary, and V be a finite set of (unary conjunctive) 
views over a, which we refer to as a a-view set. We now inductively define the set 
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UCV(cr, V) of first order unary- conjunctive-view (UCV) queries/formulas over the 
vocabulary a and a a-view set V: 

(1) if V € V, then V(x) G UCV(ct, V); and 

(2) if </>, V G UCV(er, V), then the formulas -.0, and 3a^ belong to UCV(er, V). 

The smallest set of so-constructed formulas defines the set UCV(er, V). Wc denote 
the set of all UCV formulas over the vocabulary a by UCV(cr), i.e. UCV(er) = f 
(J v UCV(cr, V) where V may be any cr-view set. Further, the set of all UCV queries 

is denoted by UCV, i.e. UCV = f U cr UCV(cr), where a is any vocabulary. As 
usual, we use the shorthands <j> V ip, <j> — > ip, <j> <-» ip, and Mx<p for (respectively) 
-i(-«f> A ->ip), ->(f> V ip, (<p — > ip) A {ip — ► 0), and -i3a;->0. Thus, the UCV language is 
closed under boolean combinations and first-order quantifications. As an example, 
consider the UCV formula 

qi = 3x(V(x) A -V'(x)) 

where V and V are defined as 

V(x) <- E(x,y) 
V'(x) +- E(x,y),E(y,z) 

This formula asserts that there exists a vertex from which there is an outgoing arc, 
but no outgoing directed walk of length 2. 

Let us make a few remarks on the expressive power of the logic UCV with respect 
to other logics. It is easy to see that the UCV language strictly subsumes UFO 
(the Lowenheim class without equality [Lowcnhcim 1915; Borger ct al. 1997]), as 
UCV queries can be defined over any relational vocabularies (i.e. including ones 
that include fc-ary relation symbols with k > 1). It is also easy to see that allowing 
any general existential positive formula (i.e. of the form 3x<p(x) where ^ is a 
quantifier-free formula with no negation) with one free variable, does not increase 
the expressive power of the logic. Indeed, the quantifier-free subformula <p can be 
rewritten in disjunctive normal form without introducing negation, after which we 
may distribute the existential quantifier across the disjunctions and consequently 
transform entire formula to a disjunction of conjunctive queries with one or zero 
free variables. Each such conjunctive query can then be treated as a view. 

There are two ways in which we can interpret a UCV formula. The standard 
way is to think of a UCV query as an FO formula over the underlying vocabulary. 
Take the afore-mentioned query q 2 as an example. We can interpret this query as 
the formula 

3x(3y, z(E(x, y) A E{y, z)) A ^3y(E(x, y)) 

over the graph vocabulary. The non-standard way is to regard a UCV query as a 
unary formula over the view set. For example, we can think of q 2 as a unary formula 
over the vocabulary a' — (V, V). Now, if <j> £ UCV(<7, V), then we denote by <f) V 
the unary formula over V corresponding to <p in the non-standard interpretation of 
UCV queries. However, for notational convenience, we shall write (p instead of <p v 
when the meaning is clear from the context. Given a vocabulary a and a er-view set 
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V = {Vi, . . . , V n }, we may define the function A : STRUCT(a) STRUCT{V) 
such that for any I G STRUCT {a) 

A ( I)^ (7; yA(I),.. . >V A(I) > 

where ' = V*(I). For example, let c = and V = {V, V} be as above, and 
let 

1= ({1,2,3,4};^ = {(1,2), (2,3), (3, 4)}}. 

Then, we have 

J d = f A(I) = ({1,2,3,4}, V J = {1,2,3},U' J = {1,2}). 

In the following, we shall reserve the symbol A to denote this special function. 
In addition, if J G STRUCT(V) and there exists a structure I G STRUCT(a) 
such that A (I) = J, we say that the structure J is realizable with respect to the 
vocabulary a and the view set V, or that I realizes J. We shall omit mention of a 
and V if they are understood by context. 

A number of remarks about the notion of realizability are in order. First, some 
unary structures are not realizable with respect to a given view set V. For example, 
the query qi has infinitely many models if treated as a unary formula, but none of 
these models are realizable, since V C V. Second, if G UCV(er, V) has a model I, 
then the structure A (I) over V is a model for <p v . In other words, if a UCV query 
is satisfiable, then it is also satisfiable if treated as a unary formula. Conversely, it 
is also clearly true that a UCV query is satisfiable, if it is satisfiable when treated 
as a unary formula and that at least one of its models is realizable. More precisely, 
if A(I) is a model for <p v , then I is a model for <j>. So, combining these, we have 
/ |= 4> iff A(7) |= (f) V . So, we immediately have the following lemma: 

Lemma 3.1. Suppose A, B e STRUCT(a) and <j) G UCV(a,V). Then, for 
A : STRUCT(o-) -> STRUCT(V) defined above, the following statements are 
equivalent: 

(1) A^^iffB^4>, 

(2) A(A) \= (f> A tffA(B)^^. 

This lemma is useful when combined with Ehrcnfcucht-Fra'isse games. For example, 
suppose that we are given a model A for 0, and we construct a "nicer" structure B 
that, we wish, satisfies <p. If we can prove that the second statement in the lemma 
(which is often easier to establish as views have arity one), we might deduce that 

4. DECIDABILITY OF UCV QUERIES 

In this section, we prove our main result that satisfiability is decidable for UCV 
formulas. Our main theorem stipulates that UCV has the bounded model property. 

Theorem 4.1. Let (f> be a formula in UCV. Suppose, further, that (f> contains 
precisely the views in the view set V, and relation symbols in the vocabulary a, with 
m being the maximum length of the views in V, and p = \a\. If (f> is satisfiable, then 
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it has a model using at most 2 2<! P ' elements, for some fixed polynomial q inp and 
to. 

Before we prove this theorem, we first derive some corollaries. Simple algebraic 
manipulations yield the following corollary. 

Corollary 4.2. Continuing from Theorem 1^.1, if n is the size of (the parse 
tree of) a satisfiable formula <j), then <f> has a model of size 2 23< ' for some fixed 
polynomial g in n. 

Corollary 4.2 immediately leads to the decidability of satisfiability for UCV. Wc 
can in fact derive a tighter bound. 

Theorem 4.3. Satisfiability for the UCV class of formulas is in 2-NEXPTIME. 

This theorem follows immediately from the following proposition and corollary 4.2. 

Proposition 4.4. Let s be a non- decreasing function with s(n) > n. Then, the 
problem of determining whether an FO sentence has a model of size at most s(n), 
where n is the size of the input formula, can be decided nondeterministically in 

2 0(nlog( S (n))) steps ^ 

PROOF. We may use any reasonable encoding code(A) of a finite structure A in 
bits (e.g. see [Libkin 2004, Chapter 6]). The size of the encoding, denoted |A|, is 
polynomial in \A\. We first guess a structure A of size at most s(n). Let s' = \A\. 
Since the size |A| of the encoding of A is polynomial in s', the guessing procedure 
takes 0(s k (n)) time steps for some constant k. We, then, use the usual procedure 
for evaluating whether A |= <j>. This can be done in 0(n x |A| n ) steps (e.g. see 
[Libkin 2004, Proposition 6.6]). Simple algebraic manipulations give the sought 
after upper bound. □ 

Observe that a lower bound for satisfiability of UCV formulas follows immediately 
from the NEXPTIME completeness for satisfiability of UFO formulas given in 
[Borger et al. 1997] 

Theorem 4.5. Satisfiability for the UCV class of formulas is NEXPTIME hard. 
What remains now is to prove theorem 4.1. 

PROOF of theorem 4.1. Let 4>,m,p be as stated in theorem 4.1. We begin by 
first enumerating all possible views over a of length at most to. As we shall see 
later in the proof of Subproperty 4.14, doing so will help facilitate the correctness 
of our construction of a finite model, since enumerating all such views effectively 
allows us to determine all possible ways the model may be "seen" by views, or parts 
of views. Let U = {Vi, . . . , Vn} be the set of all non-equivalent views obtained. By 
elementary counting, one may easily verify that N < m(mp) m . Indeed, each view 
is composed of its head and its body, whose length is bounded by to. The body is 
a set of conjuncts that we may fix in some order. There are at most to variables 
that the head can take. Each position in the body is a variable (to choices) that is 
part of a relation R (p choices). The upper bound is then immediate. 

Let Io be a (possibly infinite) model for (p. [If it is infinite, by the Lowcnhcim- 
Skolem theorem, we may assume that it is countable.] Without loss of generality, 
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we may assume that there exists a "universe" relation U in Io which contains each 
constant in adom(I ). Otherwise, if U' £ a is a unary relation symbol, the (a U 
{?7'})-structure obtained by adding to Io the relation U', which is to be interpreted 
as Io, is also a model for <p. 

Let us now define 2 N formulas Co, ... , C 2 n_ 1 of the form 

where the conjunct Vj(x) is negated iff the jth bit of the binary representation of 
i is 0. For each A G STRUCT (a), these formulas induce an equivalence relation 
on A with each set C(A) being an equivalence class. When A is clear, we refer to 
the equivalence class Cj (A) simply as C, . In addition, the existence of the universe 
relation U in Io implies that the all-negative equivalence class Co is empty. 

We next describe a sequence of five satisfaction-preserving procedures for deriving 
a finite model from I . This sequence is best described diagrammatically: 

T makeJF T renamel T rename2 T copy prune 
1 ► il ► I2 > I3 > 14 ► 15- 

The ith procedure above takes a structure 1^ as input, and outputs another structure 
Ij+i. The structure I5 is guaranteed to be finite (and indeed bounded). That each 
procedure preserves satisfiability immediately follows by subproperties 4.8, 4.10, 
4.12, 4.13, and 4.14. While reading the description of the procedures below, it is 
instructive to keep in mind that the property that C(L,) = iff Cj(Ij+i) = 
is sufficient for showing that the jth procedure preserves satisfiability (see lemma 
4.7). 

Roughly speaking, the procedure makeJF transforms the initially given structure 
Io into another structure that has a forest-like graphical representation, called a 
"justification forest" . Each subsequent procedure works only on justification forests. 
In the sequel, we shall use Hi to denote our graphical representation of If (i G 
{!,... ,5}). 

The procedure makeJF 

We define the structure Ii by first defining a sequence I 1 , I l5 . . . of structures such 
that if is a substructure of lf +1 , and then setting Ii = UfcLo ^l- [Note: we take the 
normal union, not disjoint union.] We first deal with the base case of I 1 . For each 
non-empty equivalence class C(Io), we choose a witnessing constant a, G Cj(Io). 
We define I® as the collection of all such ajS. All relations in i" are empty. Each 
di is said to be unjustified in I 1 , meaning that the model is missing tuples that can 
witness the truth of a, being a member of some equivalence class. We now describe 
how to define lf +1 from if. For each a G Jf , if a G C(Io) for some i, it is the case 
that a G Vj(Io) iff bitj (z) = 1 for 1 < j < N. For such a, we may take a minimal 
witnessing substructure S a of Io such that a G Vj(S a ) iff bitj(i) = 1. As each 
constant in adom(S a ) appears in at least one relation in S , we shall often think 
of these witnessing structures as databases (i.e. sets of tuples), and refer to them 
as justification sets. We define the structure I^ +1 to be the union of I J and all the 
witnessing structures S a such that a is unjustified in 1^. The elements in if become 
justified in I^ +1 . The elements in — if are then said to be unjustified in I^ +1 . 
Observe that the structure lf" +1 does not unjustify any elements that were justified 
in if, since there is no negation in the view definitions. Finally, the structure Ii is 
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defined as the union of all I 1 s. Observe that each element in I\ appears in at least 
one relation in Ii. 

The structure Ii has an intuitive graphical representation, which we denote by 
Tii. The graph Hi is simply a labeled forest in which each tree Tj (for some 
< i < 2 N — 1) corresponds to exactly one witnessing constant a, for each non- 
empty d. We define Tj as follows: the root of T, is labeled by S ai x d; and for 
each j = 0, 1, . . ., any x Cfc-labeled node v at level j (for some justification set Sb 
and equivalence class formula Ck), and any constant c in adom(Sb) that is distinct 
from b, define a new S c x Cfe/-labeled node to be a child of v, for the unique k' 
such that c G Cfc'(Io)- In the following, when the meaning is clear, we shall often 
refer to an (S a x Cfc)-labeled node simply as a S a -labeled node. Also, observe the 
similarity of the construction of Tii and that of Ii. In fact, the union of all S a , for 
which there is an S a -labeled node in Hi, is precisely Ii. Observe also that each 
tree Tj may be infinite. For obvious reasons, we shall refer to Ti as a justification 
tree (of cti), and to Tii as justification forest. In the following, for any justification 
tree T and any justification forest Ti, their corresponding structures (or databases), 
denoted by V(T) and V(Ti) respectively, are defined to be the union of all S a , such 
that there is an S a -labeled node in, respectively, T and Ti. Furthermore, we shall 
use adom{T) and adom(Ti) to denote adom(T>(T)) and adom(T>(Ti)), respectively. 
The elements in the set adom(T) and adom(T) and adom(Ti) are referred to as, 
respectively, constants in T and constants in Ti. 

We now illustrate this procedure by a small example. Define the UCV formula 

4> = Vx(Vi(x)A^V 2 (x)), 

where the views are 

Vi(x) <- E(x,y) 
V 2 (x) «- E(x,x). 

Here, we have V = {Vi, V2}, cr = (E), and m = 2. Suppose that 

Io = (N,£ = {(0,l),(l,2),(2,3),(3,4),...}) 

is a path extending indefinitely to the right. Then, we have Io |= <\>. Enumerating 
all non-equivalent views over a of length at most m, we have U = {Vi, V2, V3} where 

V 3 (x)^E(y,x). 

Now, there are exactly two non-empty equivalence classes: 

C wo - {0} 

C101 - {1,2,...}. 

Then, we have S = {£(0,1)} and S 4 = {E(i-l,i),E(i,i + l)} fori > 0. Following 
the above procedure, we obtain the trees T W o and Ti i as depicted in figure 1 . Note 
that Tii is the disjoint union of Ti o and T101. 

The procedure renamel 

Proviso: in subsequent procedures (including the present one), we shall not change 
the second entries (i.e. d) of each node label (i.e. of the form S a x d ) and 
frequently omit mention of them. 
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Fig. 1. A depiction of the justification forest Hi as an output of makeJF. 

The aim of this procedure is to ensure that there are no two justification trees 
T and T" with adom(T) n adom(T') ^ 0. It essentially performs renaming of 
constants in adom(T), for each tree T in Hi- This step will later help us guarantee 
the correctness of the last step that is used to produce the final model I 5 , which 
relies on a kind of "tree disjointness" property. More formally, we define I2 to 
be the disjoint union 4 of V(T) over all trees T in Hi- The justification forest H2 
corresponding to I2 can be obtained from Hi by renaming constants of the tuples 
in each tree T in Hi accordingly. 

Let us continue with our previous example of Hi- The graph H2 in this case 
will be precisely identical to Hi, except that in T101 we use the label, say, So' = 
{E(0', 1')} (resp. Si- = {E((i - 1)', i'),E(i', (i + 1)')} for i > 0) instead of S (resp. 
S, for i > 0). 

The procedure rename2 

The aim of this procedure is to transform the model in such a way that each constant 
a can appear only at two consecutive levels, say j and j + 1, within each tree. It 
appears at level j as part of an ^-labeled node v, for some constant b ^ a, and at 
level j + 1 as part of an S^-labeled node that is a child of v. Further, the procedure 
ensures that any given constant occurs in at most one node's label at each level in 
a tree. Again, this will step will later help us guarantee the correctness of the step 
that is used to produce the final model I5, which relics on the existence of a kind 
of internal "disjointness" property within trees. 



4 Thc disjoint union of two cr-structures A and B with A n B = is the structure with universe 
AU B and relation R interpreted as RA- U . If A D B 7^ 0, one can simply force disjointness by 
renaming constants. 
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Let us fix a sibling ordering for the nodes within each tree Tj in H 2 ■ Define a set 
U of constants disjoint from I 2 as follows: 

U = {a jt i : j, I G N and a E I 2 }. 

For a,b E I 2 , we require that a^; 7^ 6^,;/ whenever either j 7^ j', or I 7^ I', or 
a ^ b. For each tree Ti and for each j = 1, 2, . . ., choose the Zth node w with respect 
to the fixed sibling ordering (say, S a -labeled) at level j in Ti. Let v's children 
be vi,...,Vk (labeled by, respectively, S&i,...,S(,k with 6^ 7^ a). Now do the 
following: change v to S Q [6j bji/b 1 , . . . , b k \; and change where 1 < h < k, 

to S b h = S b h[b l j l /b h }. Observe that there are two stages in this procedure where 
each non-root node at level j, say S a -labeled, undergoes relabeling: first when we 
are at level j — 1 (the constant a is renamed by aj.k for some k), and second when 
we are at level j (constants other than aj t k are renamed for what is now S aj k ). The 
output of this procedure on TL2 is denoted by H3, whose corresponding structure 
we denote by I3. 

Continuing with our previous example. The root node U\ of T 10 o in H2 is 

5 = {£(0,1)}, its child u 2 (sibling zero at level 1) is Sj = {E(0, 1), E(l, 2)} 
and in turn the children of that child are w 3 = S = {E(0, 1)} (sibling at level 
2) and u 4 = S 2 = {E(l, 2), E(2, 3)} (sibling 1 at level 2). Under the rename2 
procedure, node u\ is unchanged, since it is at level zero. Node u 2 is changed to 

51 = {£(Oi i o,l),£ , (l,2i i0 )} Node u 3 is changed to S 0l = {E(0i t0 , l 2 ,o)} and m 4 
is changed to S 2l = {E{l 2 ,i, 2i, ), E(2 lfi , 3 2 ,i)}. 

T/ie procedure copy 

This procedure makes a number of isomorphic copies of the model H3 and then 
unions them together. Duplicating the model in this way facilitates the construction 
of a bounded model by the prune procedure, that will be described shortly. Let S 
be the total number of constants that appear in some tuples from a node label at 
level h := cm in H3, for some fixed c E N, independent from <f), whose value will 
later become clear in the proofs that follow. By virtue of procedure makeJF, we are 
guaranteed that each node in H3 can have at most N x m children, where N x m 
represents an upper bound on the number of constants each justification set might 
contain. Since there are at most 2 N trees in H 3 , by elementary counting, we see 
that S < 2 N x (N x m) h . Now, letting g := cm, make A := 5 9 (isomorphic) copies 
of H3, each with a disjoint set of constants. That is, the node labeling of each new 
copy of H.3 is isomorphic to that of W3, except that is uses disjoint set of constants. 
Let us call them the copies Si, ... , B\ (the original copy of H3 is included). So, we 
have BiDBj = 0, for i 7^ j. For each tree T» in H3, we denote by T/ 5 the isomorphic 
copy of Ti in Bk- Now, let 

H4 = £iU...U£a. 

The structure corresponding to H4 is denoted by I4. In the sequel, each node at 
level h in Bk is said to be a (potential) leaf of Bk- 

The procedure prune 

The purpose of this procedure is to transform H4 into a finite model. Intuitively, 
this is achieved by "pruning" all trees at level h and then rejustifying the resulting 
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unjustified constants by "linking" them to a justification being used in some other 
part of the model. This is the most complex step in the entire sequences of pro- 
cedures, and care will be needed later to prove to ensure that satisfiability is not 
violated when constants are being rejustified. 

We begin first by describing the connections that we wish to construct be- 
tween the different parts of the model. Roughly speaking, the model we intend 
to construct somewhat resembles a (^-regular graph, whose nodes are the copies 
£>i U . . . U £>a made earlier, and where edges between copies indicate that one copy 
is being used to make a new justification for a node at level h in another copy. 

Firstly though, we state a proposition from extremal graph theory (see [Bollobas 
2004, Theorem 1.4' Chapter III]] for proof) that can be used to guarantee the 
existence of the kind of 5-regular graph we intend to construct. 

Proposition 4.6. Fix two positive integers 5,g and take an integer A with 

A (s-iy- 1 -! 

~ 5-2 • 
Then, there exists a 5-regular graph of size A with girth at least g. 

Using S, g and A as defined in the copy procedure, this proposition implies that 
there exists a 5-regular graph G with vertices {Bi, . . . , Ba} and with girth at least 
g. Let us now treat G as a directed graph, where each edge in G is regarded as 
two bidirectional arcs. 

Observe that, for each vertex Bk, there is a bijection outk from the set of leafs 
(nodes at height h) of Bk to the set of arcs going out from Bk in G. We next 
take each leaf of Bk in turn. For a leaf v (say, Sb-labeled), suppose that outk{v) = 
(Bk,Bk>)- Choose i such that b 6 Ci(Li). If the root of T* is S c -labeled, for 
some c £ I4, then we delete all descendants of v in and change v to S c [b/c}. 
In this way, we "prune" each of the trees in H4, and link each leaf node to the 
root node of another tree for the purpose of justification. We denote by H. 5 the 
resulting collection of interlinked models, whose corresponding structure is denoted 
by I5. can be thought of as a collection of interlinked forests, where each forest 
corresponds to one of the copies {£>i, . . . , Ba} and each forest is a collection of trees. 

Observe now that each "tree" in H5 is of height h. Since there are at most Ax2 w 
trees in H5, each of which has at most (N x m) h+1 constants, we see that 

I 5 < (A x 2 N ) x (N x m) h+1 

< ((2 N x (Nm) cm ) cm x 2 N ) x {N x m) cm+1 

It is easy to calculate now that I 5 < 2 29<P,m) for some polynomial q in p and m. We 
have thus managed to construct a bounded model I 5 which satisfies the original 
UCV formula <j>. □ 

We now prove the correctness of our construction for theorem 4.1. The proof is 
divided into a series of subproperties that assert the correctness of each procedure 
in our construction. First, we prove a simple lemma. 

Lemma 4.7. Let V be a set of (unary) views over a vocabulary a. Suppose I, J 
are a-structures such that C(I) is non-empty iff C(J) is non-empty for each equiv- 
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alence class formula C constructed with respect to V. Then, I and J agree on 
UCV(a,V). 

Proof. By standard Ehrenfeucht-Fraisse argument, we see that A(I) and A(J) 
agree on UFO(V). Then, by lemma 3.1, we have that I and J agree on UCV(cr, V). □ 

SUBPROPERTY 4.8 (CORRECTNESS OF MakeJF). (1) For each node v (say, (S a x 
Ci) -labeled) of Hi, a G Ci(I\) with witnessing structure S a . 

(2) //Jo \=<j>, thenli N 4>- 

Proof. First, note that Ii C I . Since conjunctive queries are monotonic, we 
have V(Ii) C V(Io) for each view V G U. So, we have that a G V(Ii) implies that 
a E V(Io)- In addition, for each constant a G I\, if a G V(Io), then a G V^Ii), 
which is witnessed at some S a -labeled node. In turn, this implies that for a G I\, 
it is the case that a G C(l\) iff a G C(Io). This proves the first statement. Also, by 
construction, if Cj(Io) is non-empty, where i G {0, . . . , 2 N — 1}, we know that one 
of its members belongs to Ii, witnessed at the root of Tj. Therefore, we also have 
that C(Io) is non-empty iff C(Ii) is non-empty. In view of lemma 4.7, we conclude 
the second statement. □ 

At this stage, it is worth noting that, once Hi has been constructed, the subsequent 
procedures might modify the label (S a x d) — its name (e.g. from S a x d to 
S ' x d for some new constant a') as well as its contents (e.g. replacing each 
occurrence of a tuple R(a, a) by R{a' , a') for some new constant a'). Despite this, 
we wish to highlight that one invariant is preserved by each of these procedures 
that have been described: 

Invariant 4.9 (Justification Set). Suppose H is a justification forest of a 
structure I, and v a (S a x d)-labeled node of H. Then, we have a G Ci(T) with 
witnessing structure S a . 

Subproperty 4.8 shows that this is satisfied by Hi. In fact, that this invariant 
is preserved by the later procedures will be almost immediate from the proof of 
correctness of the procedure. Hence, we leave it to the reader to verify. 

Subproperty 4.10 (Correctness of renameI). If Ii \= 4>, then I 2 \= 4>. 

Proof. In this procedure, we perform constant renaming for each tree Tj in 
Hi- For the purpose of this proof, let us denote the tree so obtained by T[. Such 
a renaming induces a bijection fa : adom(Ti) — ► adom(T!). Extend /j to tuples, 
structures, and trees in the obvious way. Observe that the structures corresponding 
to the trees /i(Tj) and are isomorphic. Now, in view of lemma 4.11, it is easy to 
check that for each tree in Hi and a constant a in the structure corresponding 
to Tj, a G Cj(Ii) iff fi(a) G Cjfa). By virtue of by lemma 4.7, we conclude our 
proof. □ 

Lemma 4.11. Suppose that a is a constant in the structure corresponding to Ti 
of Hi. Then, a G V{h) iff f t {a) G V(I 2 ). 

Proof. (=>) By subproperty 4.8, it is the case that a G T^(S a ). Since /, is a 
bijection, it is also true that fi(a) G V(fi(S a )). 
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(<=) Let M be a minimal set of tuples in I 2 such that fi(a) G V(M). Observe 
that there is a one-to-one function mapping the set of conjuncts in V to M. For 
each tree Tj in 7i 2 , let Mj denote the members of M that can be found in Tj. Note 
that odom(Mj) nodom(Mj/) = for j ^ /. Now, let M' = Uj /, rl (Mj). It is not 
hard to see that a G V(M'). Since M' C I b we have aGV(h). □ 

SUBPROPERTY 4.12 (CORRECTNESS OF RENAME2). If I 2 |= (j), then I 3 \= (j). 

Proof. Define the function 77 : I 3 — > J 2 such that rj{aj^k) = a. Note that 77 is 
onto. Extend 77 to tuples, and sets of tuples in the obvious way. In view of lemma 
4.7, it is sufficient to show that, for each a G 7 3 and i G {0, . . . , 2 N — 1}, a G Ci(I 3 ) 
iff 77(a) G Ci(I 2 ). In turn, it is enough to show that, a G V(l3) iff 77(a) G V(I 2 ). 

(=>) Take a minimal set M of tuples in I3 such that a G V(M.). Then, wc have 
77(a) G V(t}(M)). Since ?7(M) C I 2 , we have 77(a) G V(I 2 ). 

(<=) Since invariant 4.9 holds for H2, the fact that 77(a) G V"(I 2 ) is witnessed 
by <S?j(a) G W 2 . Since 5 a and S^fa) are isomorphic justification sets, we have that 
a G V(l3) is justified by S a G U 3 . □ 

Subproperty 4.13 (CORRECTNESS OF copy). (1) For each node v (say, (S a x 
Ci)-labeled) ofH.4, a G Ci(Ii) with witnessing structure S a . 
(2) If I 3 h <t>, then I 4 h 4>- 

Proof. Similar to the proof of subproperty 4.10. □ 

Subproperty 4.14 (Correctness of prune). If J 4 |= 4>, then I 5 \= <f>. 

Proof. Recall that there are ^ = { x A leafs in H.4. Let us order these nodes 
as v\, . . . , VN t . Suppose also that Vi is labeled by for some hi G I4. By virtue of 
rename2, we see that bi ^ bj whenever i ^ j. Next, we may think of the procedure 
prune as consisting of Ni steps, where at step i, the node Vi has all its descendants 
removed (pruned) and Vi is changed to = S Ci [bi/ci] for some Cj € I4. Letting 

/Co = Ti-4, we denote by /Cj (i = 1, . . . , Ni) the resulting model after executing i 
steps on /C - The structure corresponding to /Cj is denoted by J^. 
We wish to prove by induction on < i < Ni that 

(I) . For each a G J i+ i and V G U, a G V(J i+ i) iff a G V(3 l ). 

(II) . Invariant 4.9 holds for Jj+i. 

(III) . For each a G Jj+i, we have a G Cj(Jj+i) iff a G Cj(J,). 

Note that Jj + i C J 4 . So, by lemma 4.7 and the fact that invariant 4.9 holds for 
the initial case Jo (from proofs of previous subproperties), statement (III) will imply 
what we wish to prove. It is easy to see that statement (III) is a direct consequence 
of statement (I). It is also easy to show that statement (I) implies statement (II). 
This follows since firstly, at step i + we replace the content of by that of S Ci , 
except for substituting bi for a. Second, the elements bi and Cj belong to the same 
equivalence class in J i; and invariant 4.9 holds for Ji by induction. Therefore, it 
remains only to prove statement (I). 

Let us now fix i < Ni, a G J,+i, and V G U. It is simple to prove that a G V(Ji) 
implies a G V(Jj + i). This is witnessed by tuples in the S a -labeled (or S^.-labeled 
if a = bi) node in /Q + i, which exists by construction. 
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Conversely, we take a minimal set M of tuples in Jj + i with a G V(Ji+i), wit- 
nessed by the valuation v. Our aim is to find a set M' of tuples in with 
a G V(M.'). Let Mj, = M — U(Ji). Intuitively, contains the set of new tuples. 
These are tuples which did not exist in the structure Jj and have been created specif- 
ically to justify the node whose descendants (justifications) have just been pruned. 
By construction, we have C S' b ., which implies that adom(M. bi ) C adom(S b ). 
Observe also that bi G adom(t) for each tuple t in M &i ; otherwise, t would be a 
tuple in S Ci C J { (i.e. it would not be a new tuple). Define 

L := {t G M - M bi : t is connected to some t! G M bj in M}. 

L consists of tuples that are connected to new tuples. Also, let L' = M - M h - L, 
i.e., the set of all tuples of M that are not connected to any (new) tuples in M. bi . 
Note that LUL' C Jj, and that the sets Mj,,, L, and L' form a partition on M. 
Also, by definition, we have adom(L') n adom(M bi U L) = 0. In the following, we 
define M Ci = M bz [cjb,]. Note that M Cz C S Ci C J,. 

Before we proceed further, it is helpful to see how we partition M on a simple 
example. Suppose that the view V is defined as 

V(x ) <- E(x ,xi), E(xi,X2), R(x 3 ,Xi), R(x4,x 5 ). 

Furthermore, suppose that we take the valuation v defined as v{xi) — i. In this 
case, M can be described diagrammatically as follows 

V(0) <- E(0, 1), E(l, 2), J2(3, 4), R(4, 5). 

Assume now that the only tuple in M that doesn't belong to V(3i) is ^(0, 1). 
Then, we have M bi = {£(0,1)}. It is easy to show that L = {£(1,2)} and 
L' = {i?(3,4),i?(4,5)}. 

We next state a result regarding L that will shortly be needed. It clarifies the 
nature of a partition that exists for L and the relationships which hold between the 
elements of the partition. 

PROPOSITION 4.15. We can find tuple-sets A, B C L such that: 

(1) AnB = 8, 

(2) AUB= L, 

(3) adom(A) n adom(B) = %, 

(4) bi ^ adom(B), and 

(5) adom(M bi ) Dadom(A) C {bi}. 

The proof of this proposition can be found at the end of this section. We now 
shall construct M' C T>(3 i ) such that a G V(M'). First, we put L' in M'. This 
does not affect our choice of tuple-sets that replace M. bi , A, and B as adom(L') n 
adom(M. bi U L) = (i.e. the set of free tuples instantiated by L' and the set of 
free tuples instantiated by M. bi U L share no common variables) , as we have noted 
earlier. There are two cases to consider: 

case 1. a ^ bi. Let F be the set of all free tuples in the body of V such that 
{v(u) : u G F} = M. bi U B. Suppose X is the set of all variables in V. Let 



18 



{yi, . . . ,y r } Clbe the set of variables in F such that v{yj) — bi. With y as a new 
variable, let F' :— F[y/y\, . . . ,y r \. Define the new view V'(y) whose conjuncts are 
exactly F': 

V'(y)^/\F> 

Trivially, we have bi G ^'(M^ UB). Then, as bi £ adom(B) by proposition 4.15, we 
have c t G V'(M Ci UB). Note that M Ci UB C D(J t ) and V G U since length(V) < 
m. So, since by induction bi and belong to the same equivalence class in J i; there 
exist tuple-sets P bi and B' with P bi UB' C J, such that b t G V'(P bi UB'). [P bi and 
B', respectively, replace the role of M Ci and B.] Observe now that a ^ adom(B) 
as a = bi. Since adom(M bi ) n adom(A) C {bi} and adom(A) n adom(B) = from 
proposition 4.15, it is easy to verify that 

a G F(P tj UAUB'uL'). 

case 2. a ^ bi. This is divided into two further cases: 
(a), bi G adom(A). This is divided into two further cases: 

(i) . a G adom(A). In this case, note that a £ adom(M. bi ) (using Proposition 
4.15(5)) and a ^ adom(B) (using Proposition 4.15(3)). We can then continue in 
the same fashion as in the case 1. 

(ii) . a adom(A). Let F be the set of all free tuples in the body of V such that 
{v(u) : u G F} = A. Let {y\, . . . ,y r } C X be the set of variables in F such that 
v(yj) = bi. Let y be a new variable (i.e. y ^ X) and F' := F[y/y\, . . . ,y r ], i.e., we 
replace each occurrence of the variables yi, . . . , y r in F by y. Then, let V'(y) be 
the view whose conjuncts are exactly F'\ 

V'{y)^f\F'. 

Then, V' G U and 6, G V'(A). Since A C P(Jj) and because bi and belong 
to the same equivalence class in Jj (by the induction hypothesis), there exists a 
set A' C V(3i) such that a G V'(A'). Since adom(M bi ) n adom(A) C {6J 
and adom(A) n adom(B) = from proposition 4.15, it is easy to check that a G 
V'(M Cz UA'UBUL'). 

f&J. &j ^ fflrfom(A). Let M Ci = f Mj, 4 [cj/6j]. By construction, we see that M Ci C 
S Ci C D(J,). By proposition 4.15 (items 4 and 5), it is the case that 

a G F(M C ,UAUBUL'). 

In any case, we have a G V{3i). This completes the proof. 

□ 

It remains to prove proposition 4.15. 

PROOF of proposition 4.15. The present situation is depicted in figure 2. 
This is a snapshot of the moment just before we apply step i + 1. Step i of prune 
procedure simply prunes the subtree rooted at the S bi -labeled node v, and links 
(rejustifies) v using the S Ci -labeled node w, where bi and Cj belong to the same 
equivalence class in J^. It is important to note that some cousin 5 w 3 of v might 



node of the same tree and level 
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Fig. 2. v is the S^-labeled node whose contents are to be changed by S Ci [bi /c»] . The node w 
is S Ci -labeled, and will be "linked" to node v after step i + 1 of prune procedure is finished 
— signified by the dotted line. Solid lines represent links that have been established in 
step j < i + 1 of the procedure. 

also be linked to a root w e of another tree, which in turn might be linked to a leaf 
node u>2 of another tree, which in turn might have a cousin W\ that satisfies the 
same property as w 3 and so on. Furthermore, the node w might have also been 
linked to some other leaf W4 that has a cousin w 5 that is connected to a root of 
some other tree, and so on. Note that it is impossible for two leafs of a tree to 
be linked to the same root node of a tree by construction. Hence, the three trees 
in the middle (i.e. where v,w, and wq are located) are necessarily distinct. The 
leftmost and rightmost tree might be the same tree depending on the value of the 
girth g that we defined earlier. 
Let us now define 

A = {t e L : <^ ( j 4) (t,6j) < m} 

B d = f {*eL:d G(Ji) (i,S Ci )<m}. 

Intuitively, the set A contains tuples up to distance m from the label of node 
v in Jj, while the set B contains tuples up to distance m from the label S Ci of w 
in J;. Note that this is distance in the structure Jj, not Jj+i. It is immediate that 
we have property (2) A U B = L, as the length of the view V is at most m and 
that adom(Mb i ) C {bi} U adom(S Ci ). So, it is sufficient to show that properties 
3 and 5 are satisfied, as they obviously imply properties 1 and 4. Note that our 
construction has ensured that: 

(1) Two nodes in any given tree in /Cj that are at least distance two apart cannot 
share a constant. 

(2) Two trees T and T" in Id cannot share a constant except on: (i) a unique leaf 
of T and the root of T", as is the case for v and w in Figure 2 or alternatively 
(ii) a unique leaf of T and a unique leaf of T". This case can happen when both 
leafs are connected to the root of a different tree T", as is the situation for w 2 
and W3 in Figure 2. 

Therefore, for some sufficiently large constant d £ N, two nodes v' and v" in 
ICi of distance c'm cannot have two elements of J, that are of distance < m in 
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G(Jj). [In fact, a careful analysis will show that d = 1 is sufficient.] Therefore, the 
locations of the constants in A (resp. B) cannot be "very far away" from the tuple 
v (resp. w). In fact, if we set c > d (recall that g = cm) and consider the path 
P between a tuple t £ A and the constant bi (which belongs to v and its parent), 
it cannot connect a root and a leaf of the same tree (i.e. through the body of the 
tree). So, cither it is completely contained in the tree of which v is a leaf, or it has 
to alternate alternate between leafs and root several times, and then end in some 
tree. In figure 2, we may pick the following example 

v — >* w 3 — > w e — > w 2 — ►* W\ — > . . . , 

where we use the notation — >* to mean "path in the same tree" . The same analysis 
can be applied to determine the locations of the tuples of B. Therefore, in order the 
ensure that properties 3 and 5 are satisfied, we just need to ensure that the height 
of each tree and the girth of /Q be large enough, which can be done by taking a 
sufficiently large c. When the girth (as ensured in the copy and prune procedures) 
is sufficiently large, we can be sure that no paths of length < m exist between v and 
w in Id [In fact, a careful but tedious analysis shows that c = 1 is sufficient.] □ 

Theorem 4.1 also holds for infinite models, since even if the initial justification 
hierarchies are infinite, the proof method used is unchanged. We thus also obtain 
finite controllability (every satisfiablc formula is finitely satisfiable) for UCV. 

Proposition 4.16. The UCV class of formulas is finitely controllable. 
5. EXTENDING THE VIEW DEFINITIONS 

The previous section showed that the first order language using unary conjunctive 
view definitions is decidable. A natural way to increase the power of the language 
is to make view bodies more expressive (but retain unary arity for the views). We 
say earlier that allowing unary views to use disjunction in their definition does 
not actually increase expressiveness of the UCV language and hence this case is 
decidable. Unfortunately, as we will show, employing other ways of extending the 
views results in satisfiability becoming undecidable. 

The first extension we consider is allowing inequality in the views, e.g., 

V(x) <- R(x, y), S(x, x), x^y 

Call the first order language over such views the first order unary conjunctive^ view 
language. In fact, this language allows us to check whether a two counter machine 
computation is valid and terminates, which thus leads to the following result: 

Theorem 5.1. Satisfiability is undecidable for the first order unary conjunctive^ 
view query language. 

Proof. The proof is by a reduction from the halting problem of two counter 
machines (2CM's) starting with zero in the counters. Given any description of 
a 2CM and its computation, we can show how to a) encode this description in 
database relations and b) define queries to check this description. We construct 
a query which is satisfiable iff the 2CM halts. The basic idea of the simulation is 
similar to one in [Levy et al. 1993], but with the major difference that cycles are 
allowed in the successor relation, though there must be at least one good chain. 
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A two-counter machine is a deterministic finite state machine with two non- 
negative counters. The machine can test whether a particular counter is empty or 
non-empty. The transition function has the form 

5 : S x {=,>} x {=,>} — *■ S x {pop, push} x {pop, push} 

For example, the statement 5(4, =, >) = (2, push, pop) means that if we are in 
state 4 with counter 1 equal to and counter 2 greater than 0, then go to state 2 
and add one to counter 1 and subtract one from counter 2. 

The computation of the machine is stored in the relation config(t, s, C\, c 2 ), where 
t is the time, s is the state and C\ and c 2 are values of the counters. The states of the 
machine can be described by integers 0, 1 . . . , h where is the initial state and h the 
halting (accepting) state. The first configuration of the machine is config(0, 0, 0, 0) 
and thereafter, for each move, the time is increased by one and the state and counter 
values changed in correspondence with the transition function. 

We will use some relations to encode the computation of 2CMs starting with zero 
in the counters. These are: 

— So, . . . , Sh- each contains a constant which represents that particular state. 
— succ: the successor relation. We will make sure it contains one chain starting 
from zero and ending at last (but it may in addition contain unrelated cycles). 
— config: contains computation of the 2CM. 

— zero: contains the first constant in the chain in succ. This constant is also used 

as the number zero. 
— last: contains the last constant in the chain in succ. 

Note that we sometimes blur the distinction between unary relations and unary 
views, since a view V can simulate a unary relation U if it is defined by V(x) <— 
U(x). 

The unary and miliary views (the latter can be eliminated using quantified unary 
views) are: 

— halt: true if the machine halts. 

— bad: true if the database doesn't correctly describe the computation of the 2CM. 

— dsucc: contains all constants in succ. 

— dT: contains all time stamps in config. 

— dP: contains all constants in succ with predecessors. 

— dColi,dCol2'- are projections of the first and second columns of succ. 

When defining the views, we also state some formulas (such as hasPred) over 
the views which will be used to form our first order sentence over the views. 

— The "domain" views (those starting with the letter d) are easy to define, e.g. 

dP(x) <— succ(z, x) 
dCol\(x) <— succ(x,y) 
dCol2(x) <— succ(y,x) 

— hasPred says "each nonzero constant in succ has a predecessor:" 
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hasPred : \/x(dsucc(x) => (zero(x) V dP(x))) 

— sameDom says "the constants used in succ and the timestamps in config are 
the same set" : 

sameDom : Mx(dsucc(x) => dT(x)) AVy(dT(y) => dsucc(y))) 

— goodzero says "the zero occurs in succ" : 

goodzero : Vx(zero(x) =$> dsucc(x)) 

— nempty : each of the domains and unary base relations is not empty 

nempty : 3x(dsucc(x)) 

— Check that each constant in succ has at most one successor and at most one 
predecessor and that it has no cycles of length 1. 

bad <— succ(x, y),succ(x, z),y ^ z 
bad <— succ{y 1 x),succ{z, x),y ^ z 
bad <— succ(x, x) 

Note that the first two of these rules could be enforced by database style func- 
tional dependencies x — > y and y-tion succ. 
— Check that every constant in the chain in succ which isn't the last one must have 
a successor 

hassuccnext : Vy(dCol2(y) => (last(y) V dCol\{y)) 

— Check that the last constant has no successor and zero (the first constant) has 
no predecessor. 

bad <— last(x), succ(x, y) 
bad <— zero(x),succ(y 7 x) 

— Check that every constant eligible to be in last and zero must be so. 

eligiblezero : \/y{dCol\{y) => (dCohiy) V zero(y)) 
eligiblelast : *iy{dCol2{y) => {dCol\{y) V last{y))) 

— Each Si and zero and last contain < 1 element. 

bad <— Si(x),Si(y),x ^ y 
bad <— zero(x), zero(y),x ^ y 
bad <— last(x), last(y),x ^ y 

— Check that Si, Sj, last, zero are disjoint (0 < i < j < h): 

bad <— zero(x), last(x) 
bad <— Si (x) , Sj (x) 
bad <— zero(x), Si(x) 
bad last(x), Si(x) 

— Check that the timestamp is the key for config. There are three rules, one for 
the state and two for the two counters; the one for the state is: 

bad <— config(t, s, c\, C2),config{t, s' , c\, c' 2 ), s ^ s' 
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— Check the configuration of the 2CM at time zero, con fig must have a tuple at 
(0, 0, 0, 0) and there must not be any tuples in config with a zero state and non 
zero times or counters. 

V Zs (s) <— zero(t),config(t, s, x, y) 
v z ci ( c ) <- zero(t),config(t, x, c, y) 
v za 2 ( c ) <- zero(t),config(t, x, y, c) 
V Vs (t) <— zero(s) ,config(t, s, x, y) 
v y ci {ci) <- zero(s),config(t, s,ci,x) 
V Vc2 (c 2 ) <- zero(s),config(t, s, x, c 2 ) 
goodcon fig zero : Vx(V Zs (x) So{x)A 

(V Zci (x) V V ZC2 (x) V V y . (x) V V Vei (x) V V yc2 (x)) =► zero(x)) 

— For each tuple in config at time t which isn't the halt state, there must also be 
a tuple at time t + 1 in config. 

Vi{t) <- config(t,s,c 1 ,c 2 ),S h (s) 
V%{t) succ(t, t2), config(t2, s', c' 11 c' 2 ) 
hasconfignext : Vt((dt(t) A ->Vi(t)) =^> ^(i)) 

— Check that the transitions of the 2CM are followed. For each transition 5(j, > 
,=) = (k, pop, push), we include three rules, one for checking the state, one 
for checking the first counter and one for checking the second counter. For the 
transition in question we have for checking the state 

Vs{t') <— configit, s, c\,C2),succ(t, t'),Sj(s),succ(x, c\), zeroic?) 
V Ss (s) <- V s (t), config(t, s, c x , c 2 ) 
goodstates : Vs(Vs s (s) Sk(s)) 

and for the first counter, we (i) find all the times where the transition is definitely 
correct for the first counter 

<- config(t,s,a,c 2 ), 
succ(t, t'),Sj(s),succ(x, Ci), 
zero{c2),succ{c'l, c\),config{t' , s' , c'{, c' 2 ) 

(ii) find all the times where the transition may or may not be correct for the first 
counter 

Q2 S (*') <— configit, s, C\, c 2 ), succ(t, t'), Sj(s), succ(x, Ci), zero(c 2 ) 
and make sure Qi 5 and Q 2s are the same 

goodtrans Sci : Vt(Q ls (t) <^ Q2 s (t)) 
Rules for second counter are similar. 

For transitions 81,82, ■■■ ,6k, the combination can be expressed thus: 

goodstate : goodstates 1 A goodstates 2 A ... A goodstates k 
goodtrans Cl : goodtranss lci A goodtranss 2ci A ... A goodtransg kci 
goodtrans C2 : goodtranss lc2 A goodtranss 2c2 A ... A goodtranss kc2 

— Check that halting state is in config. 
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hlt{t) <- config(t, s, ci, c 2 ), 5 ft (s) 
/laZi : 3xhlt(x) 

Given these views, we claim that satisfiability is undecidable for the query ^ = 
-<bad A hasPred A sameDom A ZiaZi A goodzero A goodcon fig zero A Anempty A 
hassuccnext A eligiblezero A eligiblelast A goodstate A goodtrans Cl A goodtrans C2 A 
hasconfignext □ 

The second extension we consider is to allow "safe" negation in the conjunctive 
views, e.g. 

<- Ji(ar, y),R(y, z), ->R(x, z) 

Call the first order language over such views the /irsi order unary conjunctive^ view 
language. It is also undecidable, by a result in [Bailey et al. 1998]. 

Theorem 5.2. [Bailey et al. 1998] Satisfiability is undecidable for the first order 
unary conjunctive^ view query language. ■ 

A third possibility for increasing the expressiveness of views would be to keep 
the body as a pure conjunctive query, but allow views to have binary arity, e.g. 

V(x,y) <- R(x,y) 

This doesn't yield a decidablc language either, since this language has the same 
expressiveness as first order logic over binary relations, which is known to be un- 
decidable [Borger et al. 1997]. 

Proposition 5.3. Satisfiability is undecidable for the first order binary conjunc- 
tive view language. ■ 

A fourth possibility is to use unary conjunctive views, but allow recursive view 
definitions, e.g. 

V(x) <- edge(x,y) 

V(x) V(x) A edge(y, x) 

Call this the first order unary conjunctive rec language. This language is undecidable 
also. 

Theorem 5.4. Satisfiability is undecidable for the first order unary conjunctive re ' 
view language. 

PROOF, (sketch): The proof of theorem 5.1 can be adapted by removing inequal- 
ity and instead using recursion to ensure there exists a connected chain in succ. 
It then becomes more complicated, but the main property needed is that zero is 
connected to last via the constants in succ. This can be expressed by 

conn_zero{x) <— zero(x) 

conn-zero(x) <— conn_zero(y) , succ(y, x) 

3x(last(x) A conn.zero(x)) 

□ 
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6. APPLICATIONS 

6.1 Reasoning Over Ontologies 

A currently active area of research is that of reasoning over ontologies (see e.g. 
[Horrocks 2005]). The aim here is to use decidable query languages used for ac- 
cessing and reasoning about information and structure for the Semantic Web. In 
particular, ontologies provide vocabularies which can define relationships or asso- 
ciations between various concepts (classes) and also properties that link different 
classes together. Description logics are a key tool for reasoning over schemas and 
ontologies and to this end, a considerable number of different description logics have 
been developed. To illustrate some reasoning over a simple ontology, we adopt an 
example from [Horrocks et al. 2003], describing people, countries and some rela- 
tionships. This example can be encoded in a description logic such as SHIQ and 
also in the UCV query language. We show how to accomplish the latter. 

— Define classes such as Country, Person, Student and Canadian. These are just 
unary views defined over unary relations, e.g. Country (x) <— country (x). Ob- 
serve that we can blur the distinction between unary views and unary relations 
and use them interchangeably. 

— State that student is a subclass of Person. 

\/xStudent(x) =>• Person(x) 

— State that Canada and England are both instances of the class Country. To 
accomplish this in the UCV language, we could define Canada and England as 
unary views and ensure that they are contained in the Country relation and are 
disjoint with all other classes/instances. 

— Declare Nationality as a property relating the classes Person (its domain) and 
Country (its range). In the UCV language, we could model this as a binary 
relation Nationality (x, y) and impose constraints on its domain and range, e.g. 

dom-Nationality(x) <— Nationality '(x,y) 
range-Nationality (y) <— Nationality (x,y) 
\/x{dom-Nationality{x) => Person(x)) 
\lx{range-N ationality(x) Country(x)) 

— State that Country and Person are disjoint classes. V 'x(C ' ountryix) => ->Person(x 
— Assert that the class Stateless is defined precisely as those members of the class 
Person that have no values for the property Nationality. 

has_Nationality(x) <— Nationality (x,y) 
Stateless(x) Person(x) A ^has-Nationality(x) 

The above types of statements are reasonably simple to express. In order to 
achieve more expressiveness, property chaining and property composition have been 
identified as important reasoning features. To this end, integration of rule-based KR 
and DL-based KR is an active area of research. The UCV query language has the 
advantage of being able to express certain types of property chaining, which would 
not be expressible in the description logic SHIQ, which is not able to accomplish 
chaining [Horrocks et al. 2003]. For example 
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— An uncle is precisely a parent's brother. 

unclei(z) parent(x, y), brother(x, z) 
unclei{z) <— parent(x, y),brother(z, x) 
uncle(z) <^> unclei(z) V uncle2(z) 

We consequently believe the UCV query language has some intriguing potential to 
be used as a reasoning component for ontologies, possibly to supplement description 
logics for some specialized applications. We leave this as an open area for future 
investigation. 

6.2 Containment and Equivalence 

We now briefly examine the application of our results to query containment. The- 
orem 4.1 implies we can test whether Q\(x) C Q2(x) under the constraints C\ A 
Ci ■ ■ ■ A C n where Qi,Q2 J C\ 1 . . . , C n are all first order unary conjunctive view 
queries in 2-NEXPT1ME. This just amounts to testing whether the sentence 3x(Qi (x) A 
-•Q2(xj) A C\ A . . . A C n is unsatisfiablc. Equivalence of Qi(x) and Q2(x) can be 
tested with containment tests in both directions. 

Of course, we can also show that testing the containment Qi C Q 2 is undecidable 
if Qi and Q2 are first order unary conjunctive view^ queries, first order unary 
conjunctive view^ queries and first order unary conjunctive rec view queries. 

Containment of queries with negation was first considered in [Sagiv and Yan- 
nakakis 1980]. There it was essentially shown that the problem is decidable for 
queries which do not apply projection to subexpressions with difference. Such a 
language is disjoint from ours, since it cannot express a sentence such as 3j/T4(y) A 
-3x(V\(x) A -V2(x)) where V\ and V2 are views defined over several variables. 

6.3 Inclusion Dependencies 

Unary inclusion dependencies were identified as useful in [Cosmadakis ct al. 1990]. 
They take the form R[x] C S[y]. If we allow R and 5* above to be unary conjunctive 
view queries, we could obtain unary conjunctive view containment dependencies. 
Observe that the unary views are actually unary projections of the join of one or 
more relations. 

We can also define a special type of dependency called a proper first order 
unary conjunctive inclusion dependency, having the form Qi(x) C Q2(x), where 
Qi and Q2 are first order unary conjunctive view queries with one free variable. If 
{di, . . . , dk} is a set of such dependencies, then it is straightforward to test whether 
they imply another dependency d x , by testing the satisfiability of an appropriate 
first order unary conjunctive view query. 

Theorem 6.1. Implication for the class of unary conjunctive view containment 
dependencies with subset and proper subset operators is i) decidable in 2-NEXPTIME 
and ii) finitely controllable. ■ 

The results from [Cosmadakis et al. 1990] show that implication is decidable 
in polynomial time, but not finitely controllable, for either of the combinations 
i) functional dependencies plus unary inclusion dependencies, ii) full implication 
dependencies plus unary inclusion dependencies. In contrast, the stated complexity 
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in the above theorem is much higher, due to the increased expressiveness of the 
dependencies, yet interestingly the class is finitely controllable. 

We might also consider unary conjunctive^ containment dependencies. The tests 
in the proof of theorem 5.1 for the 2CM can be written in the form Q\{x) C Q 2 (x), 
with the exception of the non-emptiness constraints, which must use the proper 
subset operator. Interestingly also, we can see from the proof of theorem 5.1, 
that adding the ability to express functional dependencies would also result in 
undccidability. We can summarise these observations in the following theorem and 
its corollary. 

Theorem 6.2. Implication is undecidable for unary conjunctive^ (or conjunctive" 1 ) 
view containment dependencies with the subset and the proper subset operators. ■ 

Corollary 6.3. Implication is undecidable for the combination of unary con- 
junctive view containment dependencies plus functional dependencies. 

6.4 Active Rule Termination 

The languages in this paper have their origins in [Bailey et al. 1998], where active 
database rule languages based on views were studied. The decidability result for 
first order unary conjunctive views can be used to positively answer an open ques- 
tion raised in [Bailey et al. 1998], which essentially asked whether termination is 
decidable for active database rules expressed using unary conjunctive views. 

7. EXPRESSIVE POWER OF THE UCV LANGUAGE 

As we have seen in the previous sections, the logic UCV is quite suitable to reason 
about hereditary information such as "x is a grandchild of y" over family trees. 
This is due to the fact that UCV can express the existence of a directed walk of 
length k in the graph, for any fixed positive integer k. Therefore, it is natural to 
also ask what is inexpressible in the logic. In this section, we describe a game- 
theoretic technique for proving inexpressibility results for UCV. First, we show an 
easy adaptation of Ehrcnfcucht-Fraisse games for proving that a boolean query is 
inexpressible in UCV(cx, V) for a signature a and a finite view set V over a. Second, 
we extend this result for proving that a boolean query is inexpressible in UCV(ct). 
An inexpressibility result of the second kind is clearly more interesting, as it is 
independent of our choice of the view set V over a. Moreover, such a result places 
an ultimate limit of what can be expressed by UCV queries. Although it can be 
adapted to any class C of structures, we shall only state our theorem for proving 
inexpressibility results in UCV over all finite structures. For this section only, we 
shall use STRU CT(a) to denote the set of all finite er-structures. 

Our first goal is quite easy to achieve. Recall that each view set V over a induces 
a mapping A : STRUCT(a) STRUCT{V) as defined in section 2. 

Theorem 7.1. Let A, B e STRUCT '(a). Define the function A : STRUCT '(a) 
STRUCTiy). Then, the following statements are equivalent: 

(1) A and B agree on UCV(cr,V). 

(2) A(A) =U FOt - v ) A(i?) (i.e. they agree on UFO(V) formulas of quantifier rank 
1. 
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Proof. Immediate from lemma 2.1, and lemma 3.1. □ 

So, to prove that a boolean query Q is not expressible in UCV(cr, V), it suffices to 
find two cr-structures such that A(A) =ufo(v) a.(B), but A and B do not agree 
on Q. In turn, to show that A(A) =u F °( y ) A(B), we can use Ehrenfeucht-Fraisse 
games. 

We now turn to the second task. Let us begin by stating an obvious corollary of 
the preceding theorem. 

Corollary 7.2. Let A,B G STRUCT(a). For any view set V, define the 
function A v : STRUCT (a) STRUCT(V). Then, the following statements are 
equivalent: 

(1) A and B agree on UCV(a). 

{2) For any view set V over a, we have A V ( A) =^ F °^ A v (B) 

This corollary is not of immediate use. Namely, checking the second statement is 
a daunting task, as there are infinitely many possible view sets V over a. Instead, 
we shall propose a sufficient condition for this, which employs the easy direction of 
the well-known homomorphism preservation theorem (see [Hodges 1997]). 

Definition 7.3. A formula <f> over a vocabulary a is said to be preserved under 
homomorphisms, if for any A,B G STRUCT(a) the following statement holds: 
whenever a = f (ai, . . . , a m ) G (j>(A) and h is a homomorphism from A to B, it is 
the case that /i(a) = (/j(ai), . . . , h(a m )) G 0(B). 

Lemma 7.4. Conjunctive queries are preserved under homomorphisms. 

Theorem 7.5. Let A,B G STRUCT{a). To prove that A(A) =^ F °^ A(B) 
for all a-view sets V, it is sufficient to show that 

(1) For every a G A, there exists a homomorphism h from A to B and a homo- 
morphism g from B to A such that g(h(a)) = a. 

(2) For every b G B, there exists a homomorphism h from A to B and a homo- 
morphism g from B to A such that h(g(b)) = b. 

Proof. Take an arbitrary cr-view set V. We use Ehrcnfcucht-Fra'isse game argu- 
ment. Suppose Spoiler places a pebble on an element a of A(A), whose domain is 
A. Then, the first assumption tells us that there exist homomorphisms h : A — > B 
and g : B — > A such that g{h{a)) = a. Duplicator may respond by placing the 
other pebble from the same pair on the element h(a) of A(B). To show this, we 
need to prove that a t— ► h(a) defines an isomorphism between the substructures of 
A(A) and A(B) induced by, respectively, the sets {a} and {h(a)}. Let V G V. It 
is enough to show that a G V(A) iff h(a) G V(B). If a G V(A), then we have 
h(a) G V(B) by lemma 7.4. Similarly, if h(a) G V(B), theorem 7.4 implies that 
a = g(h(a))EV(A). 

For the case where Spoiler plays an element of B, we can use the same argument 
with the aid of the second assumption above. In either case, we have A(A) =i 
A(B). □ 
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This theorem allows us to give easy inexpressibility proofs for a variety of first-order 
queries. We now give three easy inexpressibility proofs for first-order queries over 
directed graphs (i.e. structures with one binary relation E). 

Example 7.1. We show that the formula SYM = \/x,y(E(x,y) <-► E(y,x)) ac- 
cepting graphs with symmetric E is not expressible in UCV(a). To do this, consider 
the graphs A and B defined as follows 



A = 




B = 




4 B 

Obviously, the graph E^ is symmetric, while E D is not. Consider the functions 

hi,h 2 : A^> B and g : B — > A defined as 

— hi (a) = hi(c) = a and h±(b) = hi(d) = b, 

— /12(a) = h 2 (c) = c and h 2 (b) = h 2 (d) = d, and 

—for i G B, g(i) = i. 

It is easy to verify that hi and h 2 are homomorphisms from A to B, whereas g 
a homomorphism from B to A. Now, for x G {a,b}, we have g(hi(x)) — x and 
hi{g(x)) = x. For x G {c,d}, we have g{h 2 (x)) = x and h 2 (g(x)) — x. So, by 
theorem 7.5 and corollary 7.2, we conclude that SYM is not expressible in UCV(a) 
over all finite directed graphs. 

Example 7.2. We now show that the transitivity query 

TRANS = Vx, y, z(E(x, y) A E(y, z) -> E(x, z)) 

is not expressible in U CV(a) . To do this, consider the graphs A and B defined as 



B = 



A = 

1 




It is obvious that A \= TRANS , and it is not the case that B \= TRANS . Consider 
the homomorphisms hi , h 2 from A to B, and the homomorphism g from B to A 
defined as 

— for i G A, hi(i) = i; 

— for i G A, h 2 (i) = i + 3; and 

— for i G B, g{i) — i mod 3. 

Then, fori G A, we have g(hi(i)) — i. Conversely, suppose thati G B. Ifi — 0, 1,2, 
then hi(g(i)) = i. Similarly, ifi = 3, 4, 5, then h 2 (g(i j) = i. So, by theorem 7.5 and 
corollary 7.2, transitivity is not expressible in UCV(a) over finite directed graphs. 
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Example 7.3. The query \/x,yE(x,y) is also not expressible in UCV(a). It is 
easy to apply theorem 7.5 and corollary 7.2 on the following graphs to verify this 
fact. 




8. RELATED WORK 

Satisfiability of first order logic has been thoroughly investigated in the context 
of the classical decision problem [Borgcr et al. 1997]. The main thrust there has 
been determining for which quantifier prefixes first order languages are decidablc. 
We are not aware of any result of this type which could be used to demonstrate 
decidability of the first order unary conjunctive view language. Instead, our result 
is best classified as a new decidable class generalising the traditional decidable 
unary first-order language (the Lowcnhcim class [Lowcnheim 1915]). Use of the 
Lowenheim class itself for reasoning about schemas is described in [Theodoratos 
1996], where applications towards checking intersection and disjointness of object 
oriented classes are given. 

As observed earlier, description logics are important logics for expressing con- 
straints on desired models. In [Calvanese et al. 1998], the query containment prob- 
lem is studied in the context of the description logic T>ClZ reg . There are certain 
similarities between this and the first order (unary) view languages we have stud- 
ied in this paper. The key difference appears to be that although T>ClZ rRg can be 
used to define view constraints, these constraints cannot express unary conjunctive 
views (since assertions do not allow arbitrary projection). Furthermore, V£lZ reg 
can express functional dependencies on a single attribute, a feature which would 
make the UCV language undecidable (see proof of theorem 5.1). There is a result in 
[Calvanese et al. 1998], however, showing undecidability for a fragment of V£lZ reg 
with inequality, which could be adapted to give an alternative proof of theorem 5.1 
(although inequality is used there in a slightly more powerful way). 

Another interesting family of decidable logics are guarded logics. The Guarded 
Fragment [Andreka et al. 1998] and the Loosely Guarded Fragment [Van Ben- 
tham 1997] are both logics that have the finite model property [Hodkinson 2002]. 
The philosophy of UCV is somewhat similar to these guarded logics, since the 
decidability of UCV also arises from certain restrictions on quantifier use. In 
terms of expressiveness though, guarded logics seem distinct from UCV formu- 
las, not being able to express cyclic views, such as 3x(V(x)), where V(x) <— 
R(x, y), R{y, z),R(z, z'), R(z' , x). 

Another area of work that deals with complexity of views is the view consis- 
tency problem, with results given in [Abiteboul and Duschka 1998]. This involves 
determining whether there exists an underlying database instance that realises a 
specific (bounded) view instance . The problem we have focused on in this paper 
is slightly more complicated; testing satisfiability of a first order view query asks 
the question whether there exists an {unbounded) view instance that makes the 
query true. This explains how satisfiability can be undecidable for first order unary 
conjunctive^ view queries, but view consistency for non recursive datalog^ views 
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Table 1: 

Summary of Decidability Results for First Order View Languages 



Unary Conjunctive View 


Decidablc 


Unary Conjunctive u View 


Decidable 


Unary Conjunctive^ View 


Undccidable 


Unary Conjunctive rec View 


Undccidable 


Unary Conjunctive^ View 


Undccidable [Bailey ct al. 1998] 


Binary Conjunctive View 


Undecidable 



is in NP. Monadic views have been recently examined in [Nash et al. 2007] , where 
they were shown to exhibit nice properties in the context of answering and rewriting 
conjunctive queries using only a set of views. This is an interesting counterpoint to 
the result of this paper, which demonstrate how monadic views can form the basis 
of a decidable fragment of first order logic. 



9. SUMMARY AND FURTHER WORK 

In this paper, we have introduced a new decidable language based on the use of 
unary conjunctive views embedded within first order logic. This is a powerful gen- 
eralisation of the well known fragment of first order logic using only unary relations 
(the Lowenheim class). We also showed that our new class is maximal, in the 
sense that increasing the expressivity of views is not possible without undecidabil- 
ity resulting. Table 1 provides a summary of our decidability results. Note that 
the Unary Conjunctive 1 - 1 View language corresponds to the extension of UCV by 
allowing disjunction in the view definition. 

We feel that the decidable case we have identified, is sufficiently natural and 
interesting to be of practical, as well as theoretical interest. 

An interesting open problem for future work is to investigate the decidability of 
an extension to the first order unary conjunctive view language, when equality is 
allowed to be used outside of the unary views (i.e. included in the first order part). 
An example formula in this new language is 



yX,Y{Vx{X) AV 2 (Y) £Y) 

We conjecture this extended language is decidable, but do not currently have a 
proof. 

For other future work, we believe it would be worthwhile to investigate rela- 
tionships with description logics and also examine alternative ways of introducing 
negation into the UCV language. One possibility might be to allow views of arity 
zero to specify description logic like constraints, such as Ri(x,y) C R 2 (x,y). 

Finally, there is still an exponential gap between the upper bound complexity 
of 2-NEXPTIME and lower bound complexity of NEXPTIME-hardness that we 
derived. The primary reason for this exponential blow-up is the enumeration of all 
subviews of the views that are present in the formula, which we need for the proof. 
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